A Novel Framework and Model for Data
نویسندگان
چکیده
Data cleansing is a process that deals with identification of corrupt and duplicate data inherent in the data sets of a data warehouse to enhance the quality of data. This paper aims to facilitate the data cleaning process by addressing the problem of duplicate records detection pertaining to the „name‟ attributes of the data sets. It provides a sequence of algorithms through a novel framework for identifying duplicity in the „name‟ attribute of the data sets of an already existing data warehouse. The key features of the research includes its proposal of a novel framework through a well defined sequence of algorithms and refining the application of alliance rules [1] by incorporating the use of previously existing and well defined similarity computation measures. The results depicted show the feasibility and validity of the suggested method.
منابع مشابه
Palarimetric Synthetic Aperture Radar Image Classification using Bag of Visual Words Algorithm
Land cover is defined as the physical material of the surface of the earth, including different vegetation covers, bare soil, water surface, various urban areas, etc. Land cover and its changes are very important and influential on the Earth and life of living organisms, especially human beings. Land cover change monitoring is important for protecting the ecosystem, forests, farmland, open spac...
متن کاملA novel grey–fuzzy–Markov and pattern recognition model for industrial accident forecasting
Industrial forecasting is a top-echelon research domain, which has over the past several years experienced highly provocative research discussions. The scope of this research domain continues to expand due to the continuous knowledge ignition motivated by scholars in the area. So, more intelligent and intellectual contributions on current research issues in the accident domain will potentially ...
متن کاملA new framework for high-technology project evaluation and project portfolio selection based on Pythagorean fuzzy WASPAS, MOORA and mathematical modeling
High-technology projects are known as tools that help achieving productive forces through scientific and technological knowledge. These knowledge-based projects are associated with high levels of risks and returns. The process of high-technology project and project portfolio selection has technical complexities and uncertainties. This paper presents a novel two-parted method of high-technology ...
متن کاملA Framework for Optimal Attribute Evaluation and Selection in Hesitant Fuzzy Environment Based on Enhanced Ordered Weighted Entropy Approach for Medical Dataset
Background: In this paper, a generic hesitant fuzzy set (HFS) model for clustering various ECG beats according to weights of attributes is proposed. A comprehensive review of the electrocardiogram signal classification and segmentation methodologies indicates that algorithms which are able to effectively handle the nonstationary and uncertainty of the signals should be used for ECG analysis. Ex...
متن کاملMulti-period and Multi-objective Stock Selection Optimization Model Based on Fuzzy Interval Approach
The optimization of investment portfolios is the most important topic in financial decision making, and many relevant models can be found in the literature. According to importance of portfolio optimization in this paper, deals with novel solution approaches to solve new developed portfolio optimization model. Contrary to previous work, the uncertainty of future retur...
متن کاملUniversity Business Model Framework
The purpose of this study is to provide a framework for the university business model as a solution for universities to cooperate with businesses. The method of the present study is a qualitative case study and the research method of document analysis, focal groups have been used to collect data. In the documentation section, 60 documents related to academic business models were selected and an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011